Efficient Discovery of Correlated Patterns in Transactional Databases Using Items' Support Intervals
نویسندگان
چکیده
Correlated patterns are an important class of regularities that exist in a transactional database. CoMine uses pattern-growth technique to discover the complete set of correlated patterns that satisfy the user-defined minimum support and minimum all-confidence constraints. The technique involves compacting the database into FP-tree, and mining it recursively by building conditional pattern bases (CPB) for each item (or suffix pattern) in FP-tree. The CPB of the suffix pattern in CoMine represents the set of complete prefix paths in FP-tree co-occurring with itself. Thus, CoMine implicitly assumes that the suffix pattern can concatenate with all items in its prefix paths to generate correlated patterns of higher-order. It has been observed that such an assumption can cause performance problems in CoMine. This paper makes an effort to improve the performance of CoMine by introducing a novel concept known as items’ support intervals. The concept says that an item in FP-tree can generate correlated patterns of higher-order by concatenating with only those items in its prefix-paths that have supports within a specific interval. We call the proposed algorithm as CoMine++. Experimental results on various datasets show that CoMine++ can discover high correlated patterns effectively.
منابع مشابه
Discovering Periodic-Frequent Patterns in Transactional Databases
Since mining frequent patterns from transactional databases involves an exponential mining space and generates a huge number of patterns, efficient discovery of user-interest-based frequent pattern set becomes the first priority for a mining algorithm. In many real-world scenarios it is often sufficient to mine a small interesting representative subset of frequent patterns. Temporal periodicity...
متن کاملDiscovering Diverse-Frequent Patterns in Transactional Databases
In the area of data mining, the process of frequent pattern extraction finds interesting information about the association among the items in a transactional database. The notion of support is employed to extract the frequent patterns. Normally, a frequent pattern may contain items which belong to different categories of a particular domain. The existing approaches do not consider the notion of...
متن کاملMaRFI: Maximal Regular Frequent Itemset Mining using a pair of Transaction-ids
Frequent pattern mining is the fundamental and most dominant research area in data mining. Maximal frequent patterns are one of the compact representations of frequent itemsets. There is more number of algorithms to find maximal frequent patterns that are suitable for mining transactional databases. Users not only interested in occurrence frequency but may be interested on frequent patterns tha...
متن کاملClosed Regular Pattern Mining Using Vertical Format
Discovering interesting patterns in transactional databases is often a challenging area by the length of patterns and number of transactions in data mining, which is prohibitively expensive in both time and space. Closed itemset mining is introduced from traditional frequent pattern mining and having its own importance in data mining applications. Recently, regular itemset mining gained lot of ...
متن کاملData sanitization in association rule mining based on impact factor
Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012